Unexpected Properties of Bandwidth Choice When Smoothing Discrete Data for Constructing a Functional Data Classifier.
نویسندگان
چکیده
The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually smooth functions that were not actually observed. Existing literature shows that this approach is effective, and even optimal, when using functional data methods for prediction or hypothesis testing. However, in the present paper we show that this approach is not effective in classification problems. There a useful rule of thumb is that undersmoothing is often desirable, but there are several surprising qualifications to that approach. First, the effect of smoothing the training data can be more significant than that of smoothing the new data set to be classified; second, undersmoothing is not always the right approach, and in fact in some cases using a relatively large bandwidth can be more effective; and third, these perverse results are the consequence of very unusual properties of error rates, expressed as functions of smoothing parameters. For example, the orders of magnitude of optimal smoothing parameter choices depend on the signs and sizes of terms in an expansion of error rate, and those signs and sizes can vary dramatically from one setting to another, even for the same classifier.
منابع مشابه
Functional Analysis of Iranian Temperature and Precipitation by Using Functional Principal Components Analysis
Extended Abstract. When data are in the form of continuous functions, they may challenge classical methods of data analysis based on arguments in finite dimensional spaces, and therefore need theoretical justification. Infinite dimensionality of spaces that data belong to, leads to major statistical methodologies and new insights for analyzing them, which is called functional data analysis (FDA...
متن کاملAdaptive bandwidth selection in the long run covariance estimator of functional time series
In the analysis of functional time series an object which has seen increased use is the long run covariance function. It arises in several situations, including inference and dimension reduction techniques for high dimensional data, and new applications are being developed routinely. Given its relationship to the spectral density of finite dimensional time series, the long run covariance is nat...
متن کاملOn adaptive smoothing in kernel discriminant analysis
One popular application of kernel density estimation is in kernel discriminant analysis, where kernel estimates of population densities are plugged in the Bayes rule to develop a nonparametric classifier. Performance of these kernel density estimates and that of the corresponding classifier depend on the values of associated smoothing parameters commonly known as the bandwidths. Bandwidths that...
متن کاملDetection of high impedance faults in distribution networks using Discrete Fourier Transform
In this paper, a new method for extracting dynamic properties for High Impedance Fault (HIF) detection using discrete Fourier transform (DFT) is proposed. Unlike conventional methods that use features extracted from data windows after fault to detect high impedance fault, in the proposed method, using the disturbance detection algorithm in the network, the normalized changes of the selected fea...
متن کاملPersian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network
Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Annals of statistics
دوره 41 6 شماره
صفحات -
تاریخ انتشار 2013